Context modeling for text/non-text separation in free-form online handwritten documents

نویسندگان

  • Adrien Delaye
  • Cheng-Lin Liu
چکیده

Free-form online handwritten documents contain a high diversity of content, organized without constraints imposed to the user. The lack of prior knowledge about content and layout makes the modeling of contextual information of crucial importance for interpretation of such documents. In this work, we present a comprehensive investigation of the sources of contextual information that can benefit the task of discerning textual from non-textual strokes in handwritten online documents. An in-depth analysis of interactions between strokes is conducted through the design of various pairwise clique systems that are combined within a Conditional Random Field formulation of the stroke labelling problem. Our results demonstrate the benefits of combining complementary sources of context for improving the text/non-text recognition performance.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Robust Segmentation of Unconstrained Online Handwritten Documents

A segmentation algorithm, which can detect different regions of a handwritten document such as text lines, tables and sketches will be extremely useful in a variety of applications such as retrieval, translation and genre classification. However, this task is extremely challenging for handwritten documents, which vary considerably in their structure and content. In this paper, we describe a rob...

متن کامل

Off-line Arabic Handwritten Recognition Using a Novel Hybrid HMM-DNN Model

In order to facilitate the entry of data into the computer and its digitalization, automatic recognition of printed texts and manuscripts is one of the considerable aid to many applications. Research on automatic document recognition started decades ago with the recognition of isolated digits and letters, and today, due to advancements in machine learning methods, efforts are being made to iden...

متن کامل

Directional Stroke Width Transform to Separate Text and Graphics in City Maps

One of the complex documents in the real world is city maps. In these kinds of maps, text labels overlap by graphics with having a variety of fonts and styles in different orientations. Usually, text and graphic colour is not predefined due to various map publishers. In most city maps, text and graphic lines form a single connected component. Moreover, the common regions of text and graphic lin...

متن کامل

Text line and word segmentation of handwritten documents

In this paper, we present a segmentation methodology of handwritten documents in their distinct entities, namely, text lines and words. Text line segmentation is achieved by applying Hough transform on a subset of the document image connected components. A post-processing step includes the correction of possible false alarms, the detection of text lines that Hough transform failed to create and...

متن کامل

Compression of Scan Digitized Handwritten Text for Indian Language Document

Document image compression is used for the speedy transmission of the data over the web. This paper deals with effective compression scheme for handwritten gray level documents in Devnagri script. The current OCR technology is not effective for handling the handwritten textual images. The proposed compression scheme is based on the separation of foreground and background of the image. Experimen...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2013